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Abstract 

Interval-valued linear regression has been investigated for some time. One of the critical issues is 
optimizing the balance between model flexibility and interpretability. This paper proposes a linear model 
for interval-valued data based on the affine operators in the cone C = {(a:, y) G R 2 |a: < y}. The resulting 
new model is shown to have improved flexibility over typical models in the literature, while maintaining a 
good interpretability. The least squares (LS) estimators of the model parameters are provided in a simple 
explicit form, which possesses a series of nice properties. Further investigations into the LS estimators 
shed light on the positive restrictions of a subset of the parameters and their implications on the model 
validity. A simulation study is presented that supports the theoretical findings. An application to a real 
data set is also provided to demonstrate the applicability of our model. 


1 Introduction 

Recently there has been an increasing interest in the linear regression for interval-valued data. See Dia- 
mond(1990), Korner and Nather (1998) , Gil et al. (2002, 2007), Manski and Tamer (2002), Carvalho et 
al. (2004), Billard (2007), Gonzalez-Rodrfguez et al. (2007), Lima Neto and De Carvalho (2008, 2010), 
Blanco-Fernandez et al. (2011), Cattaneo and Wiencierz (2012), for a partial list of references. Existing 
models have been developed mainly in two directions. In the first direction, separate point-valued linear 
regression models are fitted to the center and range (or the lower and upper bounds), respectively, treating 
the intervals essentially as bivariate vectors. Examples belonging to this category include the center method 
by Billard and Diday (2000), the MinMax method by Billard and Diday (2002), the (constrained) center and 
range method by Lima Neto and De Carvalho (2008, 2010), and the model M by Blanco-Fernandez et al. 
(2011). The second direction is to view the intervals as subsets in R and study their linear relationship in 
the framework of random sets. Investigations along this direction include Diamond (1990), Gil et al. (2001, 
2002), Gil et al. (2007), Gonzalez-Rodrfguez (2007), and Sun and Li (2014), among others. In this paper, we 
propose a new linear model for interval-valued data that aims at connecting the two directions and achieving 
improved flexibility. 

To facilitate our presentation, let us give a brief introduction on the theoretical framework of random 
sets. Let (f2,£,P) be a probability space. Denote by !C (R d ) or 1C the collection of all non-empty compact 
subsets of R d . In the space /C, a linear structure is defined by Minkowski addition and scalar multiplication, 
i.e., 

A + B = {a + b : a £ A, b £ B} , A A = {Aa : a £ A} , (1) 

VA, B £ 1C and A £ R. A random compact set is a Borel measurable function A : Cl —> 1C, K, being equipped 
with the Borel er-algebra induced by the Hausdorff metric. For each X £ 1C (R d ), the function defined on 
the unit sphere 

Sx ( u ) = sup (it, x ), Vit £ S d ~ x 
xex 
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is called the support function of X. If A(u>) is convex almost surely, then A is called a random compact 
convex set. (See Molchanov 2005, p.21, p.102.) The collection of all compact convex subsets of R d is denoted 
by K-c (R d ) or Kc- Especially, when d = 1 , /Cc(R) contains all the non-empty bounded closed intervals in 
R. A measurable function X : Q — > Kc (R) is called a random interval. Much of the random sets theory has 
focused on compact convex sets (see, e.g., Artstein and Vitale (1975), Aumann (1965), and Lyashenko (1982, 
1983)). Let S be the space of support functions of all non-empty compact convex subsets in K-c ■ Then, S is 
a Banach space equipped with the L 2 metric 


IMiOlh 


d [ \sx{u)\ 2 y (du) 

. Is- 1 - 1 


where p is the normalized Lebesgue measure on S'*— 1 . According to various embedding theorems (see Rad- 
strom 1952; Hormander 1954), K-c can be embedded isometrically into the Banach space C(S) of continuous 
functions on S d ~ l , and S is the image of K-c into C(S). Therefore, <5 (A, Y) Hs^ — sy|| 2 , VX, Y £ Kc , 
defines an L 2 metric on Kc- 

The central idea to constructing linear models in the random sets framework is to minimize the distance 
S (Y, E(Y\X)) on the data, where X, Y are random intervals and E(Y\X) is a linear function of X in the 
sense of Such models have very nice mathematical interpretations, but the restriction to the space 
Kc (R) unfortunately results in a reduced flexibility from practical point of view. Notice that 

(aX + b) C = aX G + b, 

(.aX + b) R = \a\X R . 


This implies that the slope parameters of the corresponding linear model for the center (C) and range (R) 
must be the same in absolute value (see, e.g., Gil et al. (2002) and Sun and Li (2014)). Such a restriction is 
usually relaxed in the models developed for the center and range separately. Those models typically treat an 
interval as a vector in the Euclidean space R 2 and minimize the Euclidean distance ||Y — E (Y|X)|| on the 
data. However, this approach is slightly problematic in that once the intervals are represented by vectors in 
R 2 , they should be modeled as such, as opposed to being broken down to the centers and ranges separately. 
Particularly, a linear model in R 2 in general takes on the form 


Y = AX + b + e, 


( 2 ) 


where A is a 2 x 2 coefficient matrix, b is a 2 x 1 intercept vector, and e is a 2 x 1 error vector. There is no 
reason to separate the two coordinates of X by forcing A to be diagonal. This problem makes the bivariate 
types of models hard to interpret both in Kc (R) and in R 2 . 

Our main contribution in this paper is to generalize the bivariate types of models from the literature (i.e., 
models in the first research direction by the preceding discussion) to the form ([2j , which is accomplished by 
embedding the space Kc (R) into R 2 , and more precisely, into the cone C = {(x,y) £ R 2 |x < y}. As such, 
our proposed new linear model has generally improved flexibility over the existing models in both directions. 
It is also well interpretable in C due to the embedding. We extend the univariate model to the multiple 
case and derive the matrix form of the general multivariate model. The least squares (LS) estimates for the 
model parameters are provided in matrix form, from which a series of properties are derived. Furthermore, 
we give explicit analytical LS solutions for the positive parameters, which shed light on the behaviors of 
the LS estimators in connection with the positive restriction and the model validity. Simulation studies are 
carried out that produce consistent results with our theoretical findings. Finally, an application to a real 
data set is presented to demonstrate the applicability of our model. 

The rest of the paper is organized as follows. Section 2 formally introduces our model and discusses the 
associated model properties. The LS estimators and their properties are presented in Section 3, followed 
by a rigorous discussion on the estimation of the positive parameters in Section 4. Simulation studies are 
reported in Section 5, and the real data application is presented in Section 6. We give concluding remarks 
in Section 7. Technical proofs are collected in the Appendix. 
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2 The linear model 

2.1 The affine operator in C and the univariate model in /Q;(M) 

Assume observing an i.i.d. random sample of paired intervals X,; = [X 7 . X] 7 ], Yi = [Y/ J , Yf 7 ], i = 1, • • • , n, 
where X 7 , Y 7 and Xf, Y t u are the lower and upper bounds of Xi and Yi, respectively. Alternatively, the 
interval X, can also be represented by its center Xf and range Xf as 

Xf = (Xf + Xf)/2, 

v R _ vU vL 

and similarly for Yj. The (5-metric in the space KLc (R) is given by 


5 (X 1; X 2 ) = s j l - (Xf - Xf ) 2 + l (Xf - Xf). 

This suggests that the metric space (Xc(R),(5) can be embedded isometrically into the cone C = {(x,y) £ 
R 2 |x < y} equipped with the Euclidean metric. Therefore, we consider each interval X = [X L , X* 7 ] £ /Cc(R) 
to be represented by the point (X i ,X c/ ) £ C. 

From the preceding discussion of embedding, we propose to construct a linear model in /Cc(R) based 
on the affine operator in C, i.e., affine operator T : R 2 —>• R 2 satisfying T(C) C C. Obviously, such affine 
operators are represented by 

V 

y + e\' 

with a,j9,ij £ I and 7, 6 > 0. This leads us to propose the following univariate linear model 

Ff = oXf + /3Xf + r) + ef, (3) 

Y? = (a- 7 )Xf + (/3 + 7 )Xf + j 7 + 0 + ef, (4) 

where a, f 3 , r\ £ M, 7, 9 > 0 are coefficients, and {ef , ef} are i.i.d. zero mean random variables with variance 
<t 2 > 0, i = 1, • • • , n. 


X 


a P 


X 

y 

r 

a — 7 P + 7 


y 


2.2 Collinearity preservation 


The most important property of affine transformation is that it preserves collinearlity. This in the cone C 
means that points lying on a ray are still on a ray after transformation. Precisely, the operator T maps the 
ray 

y = ax + b, y > x 


into another ray 


T(y) 



7 («- 1) 
a + /3a 


T (; x ) +76 + 0 


7 (a - 1) 
a + /3 a 


Wb + rj), T (y) > T (x). 


Figure |T] gives an illustration of this effect. Considering the equivalence of the point (XfX* 7 ) £ C and 
the interval [X L ,X U ] £ /Cc(R), we define a collection of intervals {[Xf, Xf ] : i £ /} to be collinear if their 
representations {(X, L . Xf) : i £ /} in C are on a ray. 

Definition 1. A collection of intervals {[X 7 ', X( ; ] : * £ /} are said to be collinear if they satisfy the equation 

XY = aX 7 f + b, (5) 

where Xf > - ^-r if a > 1, X / 7 <- \ if a < 1, and b > 0 if a = 1. 

i — a — 1 J > 1 — a— 1 J 1 — J 
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It is easily seen that equation © can be equivalently expressed as 

X R = —[(a — 1) X c + b\ . 
a+ 1 L ' 1 

So, we can also define collinearity in terms of the center and range of the interval. 

Definition 2. The collinearity of a collection of intervals { [X R . X- J \ :i€l} is equivalently defined by 

X R = cX? + d, (6) 

where Xf > — | if c > 0, Xf < | if c < 0, and d> 0 if c = 0. 

From these two definitions, collinearity of intervals essentially means that the upper bound changes 
linearly with the lower bound, or equivalently, the range changes linearly with the center. For example, it is 
a common situation in practice that a larger center is associated with a wider range. When this relationship 
is linear, the corresponding intervals are considered collinear, and such a characteristic gets preserved under 
the operator T. Figure [2] provides a visualization of this property. In terms of modeling, if an interval-valued 
data Xi = \xf,xY], yt = [yf, y R ] , i = 1 , • • • , n, follows our model ©-([ 2 ]), then for xfs that are collinear, 
their associated yf s are also collinear. 




Figure 1: A graphical illustration of the affine transformation in C, which is above the line y = x in R 2 . The 
solid line is a ray y = ax + b, y > x in C, and the dash-dotted line is its image by T with the parameters 
a = —2, f3 = 4,7 = 5, r] = 1 ,8 = 3. Left: a = 2, b = 1; Right: a = —2, 6=1. 


2.3 Comparison to other models 

As we mentioned in the introduction, our model has systematically improved flexibility over typical models in 
the literature. In this section, we compare our univariate model ©-([4]) to two popular models to gain more 
insight into this. Consider the M model proposed by Blanco-Fernandez et al. (2011), and the constrained 
center and range method (CCRM) by Lima Neto and De Carvalho (2010). The M model is specified as 

Yp = cnXf + 7 + midej, (7) 

Y r = \/3\X R + sprei, ( 8 ) 


where mide^ and spre^ are center and range of the interval-valued random error a, respectively, mide^ is 
assume to be a centered random variable and spre^ is assumed to be a positive random variable. On the 
other hand, the model of CCRM is defined as 


Y c 

l 

\xR 
1 i 


+ + ef, 


A? + f3 R X R + , 


( 9 ) 

( 10 ) 
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Figure 2: Visualization of collinear intervals. Each interval is displayed as a horizontal line segments, and 
the intervals are elevated proportionally in order to be displayed in one plot. The left two plots are collinear 
intervals that satisfy equation ((Sj) with (top) a = 2,b = 1 and (bottom) a = —2 ,b = 1. The right two plots 
are their corresponding images by T with parameters a = —2, /3 = 4, 7 = 5, rj = 1, 9 = 3. 


where ep and e P are both centered random variables without any geometric interpretations. The two coef¬ 
ficients (3q , Pi in the range regression equation are both restricted to be positive to ensure the positiveness 
of fyfy It is easy to see that these two models are essentially equivalent with ep = mide*, P^ = E (spre,), 
and ef = sprej — E (spre^). Rewriting equations (Iffll- flTOll in terms of the lower and upper bounds, the model 
of CCRM is equivalently represented as 


Y- = 


Y- = 


ft - + \ v? 

P% + \l$ + \ {P? - 


■ Pi) X? + \ (P? ~ Pi) X? + e? ~ \e«, 
Pi) X* + \ {Pi + Pi) Xf + ef + ief , 


( 11 ) 

( 12 ) 


where /3jp > 0 , Pi > 0 . This compared to our model (O-(SI) is a reduced form with the restrictions a = P + j. 
So our model has one extra degree of freedom, which will drastically expand the model flexibility. The CCRM 
is extended to the multiple case, from which the advantage of our general model introduced in the following 
gets multiplied. We will elaborate more on this in the simulation and real data application sections. 


2.4 Matrix form of the general model 

Consider the general case involving the outcome interval Y) = [ipfyYp 7 ] and p interval-valued predictors 
Xj^i = [Xp i; XV], i = 1, • • • , n; j = 1, • • • ,p. To model fy by a linear transformation of (Vfy,; : j = 1, • • • ,p}, 
we extend the univariate model (0-([4]) to the following form: 

Xt = ^2 { a jX2 + PjXV.) + 77 + ef , ( 13 ) 

l=i 

= "22 [( a i ~ 7 j) Xj,i + (Pj + 7 j) XX] + V + 0 + ef, (14) 

l=i 
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j = I)-" ,P- 

Define 


E (ef) = 

E ( 

4) 

= 0, and 

Var (e 

Y) = 

Var(ef) 

= a 2 
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Then equation (1T51) is expressed as 
Define 

Y u = 


Y \ 

y 2 u 


Y u 


Y L =X 1 (3 1 +e L . 


1 *5 

1 V 2 x* 2 


and 


X, = 


1 x* n x* n 


02 = 


Then equation (fl4l) is rewritten as 

To jointly express the model, define 

Y = 


Jp. 

Y u = [Xr X 2 ] 


01 
02 


y R ' 

A p, 2 


yY p,n m 


rY L i 


Xi 

0 


173,1 


VI 


^ X — 

X-, 

X 2 

, 0 = 

r' 1 

02. 

, and e = 



Then, the general model (|T3 ]l - (fT4]) . can be written in the matrix form 

Y = X/3 + e. 


(15) 


3 Least squares estimation 


We define our least squares estimator (3 of (3 as the minimizer of the sum of squared lower and upper bound 
errors. Namely, 


13 = argmin 





(16) 
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where 


YF 


y. { a i x f,i + ^y,i )+ 7 ?> 

i=i 

y [(%•- 7 j) -y*+ {Pj+ij) y,i\ +11+0. 


(17) 

(18) 


1=1 


This is equivalent to minimizing the sum of squared (5-distance in the metric space (/Ce (M), S). Theorem Q] 
gives the explicit analytical expression of (3 in matrix form. 

Theorem 1. Consider the linear model E3 )~P4))> or equivalently, its matrix form ESP- If the data matrix 
X has full rank, then the least squares estimate (3 defined in GSD is given by 


a =(* t jc) 


-1 


x t y. 


(19) 


Departing from its matrix form, a series of nice properties of $ follows immediately from the classical 
theory of linear models. (See, e.g., Seber (1997).) We summarize them in the following corollaries. 

Corollary 1 . (3 in Theorem Q] is unbiased. 

Corollary 2. (3 in TheoremUjis consistent. 

Corollary 3. The variance-covariance matrix of /3 is 


Cov 


(*M 


= (X 1 X) 1 a 2 . 


( 20 ) 


Corollary 4. An unbiased estimator of a 1 is given by 


(y-xp) t (y-xp) 

2n — 3p — 2 


( 21 ) 


4 Positive restrictions 


The model setting requires that > 0, 6 > 0, j = 1, • • • ,p. However, (3 given in (1191) does not automatically 
guarantee these conditions. In this section, we thoroughly discuss these positive restrictions for the least 
squares estimation and their implications on the model fitting. We begin by making a few notations and 
assumptions. 

Notation 1. Denote by X^f andY v the random variables from which ^X^and, {YF}" t are samples, 
respectively, where k = 1, • • • ,p and V £ {L , U}. 

Notation 2. Denote by 


^ = fey fey 


the sample covariance of Xj} and Xj 7 , k, j = 1, • • • ,p. Similarly, denote by 


the sample covariance of Xj} and Y R , k = 1, • • • ,p. 
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Assumption 1. The ranges of the predictors {X R ,j = 1, • • • ,p} are mutually uncorrelated. 

Assumption 2. The range of each predictor X R is empirically positively correlated with the range of the 
outcome Y R , i.e., Sj > 0 for j = 1, • • ■ ,p. 

From the model specification (113l) - (fT4l) . it is seen that 

= + * + *?, ( 22 ) 

j =i 


where e R = ef — ef , i = 1, • • • , n. This immediately implies the following results, which give interpretations 
of the positive parameters 6 and jj, j = 1, • • • ,p. 


Proposition 1 . Assume model 113\)-[T4\). Then, 


'Cov 


'Cov (X R , X R ) 

■ ■ cov(x R ,x R y 


7i 

Cov (X r ,Y r )_ 


Cov(X R ,X R ) 

■ • Cov{X R ,X R )_ 


1p. 


2. e = E(Y R )-Y!’ J=1 T 3 E{X?). 

It can be shown that the positive parameters {9. 7 ? -. j = 1, • • • ,p} are indeed estimated independently 
from the rest of the parameters {ctj, /3j,j = 1, • ■ ■ ,p}. We list their analytical LS solutions separately in the 
following theorem. 


Theorem 2. Consider model Define the sample variance-covariance matrix of {X R , j = 1, • • • ,p} 

as 


Y x r — 




2=1 


2=1 


2=1 


:= [Sk,] P kj=1 ■ 


k,j—l 


Additionally, denote by T, x r y r the vector that contains the sample covariances of X R andY R , k = 1, 


■ ■ ,P, 


j x r .y r — 


rR 


i=l 


i=l 


■■= [ 5 fc ] Li ■ 


fc= 1 


Let F = [71, • • • , 7 P ] • Then, the LS estimator f is the solution of the linear system 



— Y, x r y r , 

(23) 

and the LS estimator of 6 is 

e = YE-j2%ccf. 

1=1 

(24) 

From (123H-(|24l) and in view 

of Proposition [U we see that 7 j,j = 1, • 

• • ,p| are essentially moment 


estimators of the underlying parameters, which are in fact strongly consistent. This also explains from 
another perspective the consistency shown in Corollary [2] In particular, an important interpretation of 
Theorem [5] is that if at least one of the positive parameters is estimated to be negative for a large sample 
size, it indicates that the underlying true parameter is negative with a high probability, and forcing the 
parameter to be positive may result in possible biases. We give a simplified example in the following 
Corollary to illustrate the implication of the positive restriction on {7 3 ,j = 1, • • • ,p}. 

Corollary 5. Consider the univariate model Let 7 be the LS estimate and 7 be any constrained LS 

estimate of 7 such that 7 > 0. If 7 < 0, then 



2=1 2=1 














where Y R is the predicted value for Y, R based on the constrained LS estimates. The “=” holds if and only if 
7 = 0. 

For the univariate model, if the LS estimate of 7 is negative, forcing it to be positive will result in the 
model being worse than the constant model Y R for the range. Similar biases are expected for the multivariate 
cases too. Therefore, it is not recommended that a constrained optimization algorithm always be used to 
ensure positive estimates, if some of the LS estimates {7 j,j = 1, • • • ,p} are negative. At least a different 
model that accounts for the negative LS estimates should be considered as an alternative to the constrained 
linear model. In practice, it is often assumed that the predictors {Xj, j = 1, ■ ■ • ,p} are independent. We 
provide a sufficient condition under which the LS estimates {7^, j = 1, • ■ • , p} are positive with probability 
converging to one. 

Corollary 6. Under Assumption^! J 7 j > 0 with probability going to one if Assumption^ is met. 

Intuitively, under the circumstance of independent predictors, model (fT51) - (ll4D implies that 

Cov(X R ,Y R ) > 0, j = l,...,p. 

Consequently, for data that the model is appropriate for, the sample covariances {Sj,j = 1, • • • ,p} are 
positive almost surely, which by Theorem [5] is sufficient to ensure the positiveness of {7 j,j = l,-- - ,p}. 
Otherwise, the 7 j ’s can be negative, but that is essentially because one or more of the predictors are negatively 
correlated with the outcome in range and hence the model is not appropriate. 

From the preceding discussion, if 7 j > 0, j = 1, • • ■ , p, it means that the model fits the linear structure of 
the data very well. At this point, if 0 < 0, it may not be worth forcing it to be positive using a constrained 
optimization, as that may bring unnecessary biases. The following theorem gives a guidance of judgment for 
such a situation. 


Theorem 3. Assume model 11 3\) - [lf\ ), or its equivalent matrix form 11511 . Let 

£* = £ l3 X*+0 (25) 

i=i 


be the model predicted value for Y R . Then, 


P 


(y r < 0) 


< 


Var (Y r ) - Var (tf) 2(J 2 


to*)' 


to*) 


2 ’ 


(26) 


Given a negative 9, it is possible to get negative predicts for Y R . However, if the unexplained variance 
of Y r is very small compared to the scale of (Y R ^ , the chance to get a negative predict is tiny, and the 
rare cases of negative predict, if happened, can be rounded up to 0. In practice, the unexplained variance of 
Y r is estimated by 2d 2 , which is then compared to the scale of ( Y R y from the data to decide whether to 
stay with the negative unbiased LS estimate 6 or resort to a constrained LS estimate. 


5 Simulation 

We present a simulation study to demonstrate the empirical performance of the LS estimates and compare 
our model to some peer models in the literature. In particular, we consider the following four model config¬ 
urations: 


• I: p=l, ip ai,/3i ~ Unif(0,4), $,71 ~ Unif(l,3), and ef,ef ~ Unif (0, er 2 ) with a ~ Unif(2,4), 
i= l,--- ,n; 

• II: p=l, r),ai,/3i ~ Unif (—4,0), 0,71 ~ Unif (1,3), and e R ,eY ~ Unif(0,<r 2 ) with <7 ~ Unif (2,4), 
i= 1, • • • ,n; 
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• III: p=3, r],aj,(3j ~ Unif(—4,4), 0, 7 ,- ~ Unif(l,3), j = 1,2,3, and ef,ef ~ Unif (0, er 2 ) with a ~ 
Unif (2,4), z = 1, • • • ,n. 

The first two are univariate models, with positive and negative interval correlations between Y and X\. 
respectively. Figure [3] shows a plot of simulated data with n = 100 observations from each of the two models. 
The third one is a 3-dimensional model, with Y and Xj, j = 1, 2, 3, either positively or negatively correlated. 
A particular data with n = 100 observations simulated from this model is visualized in Figure |4l where it is 
seen that Y is positively correlated with both X\ and Xi , and negatively correlated with A 3 . 




Figure 3: Plots of simulated data from model I and II, respectively, each with sample size n = 100. 


To investigate the empirical performance of the LS estimation, we simulate 500 independent data from 
each of the three model configurations and calculate the LS estimates of the parameters for each simulated 
data. The results are summarized into Table [T] The mean relative error (MRE) for the estimated coefficient 
matrix (3 and variance of error er 2 , given a fixed sample size n, are defined as 


MRE (3) 


1 

500 ^ 110*11 


where ||-|| denotes the Euclidean norm, and 


MRE 


(a 2 ) = — 
V J 500 


500 


\<r k - u%\ 


k =1 


respectively. We simulate observations for each X, independently, so Assumption Q] is automatically satisfied. 
Assumption [2] is checked before we compute the LS estimates for the parameters for each simulated data. If 
it is satisfied, then (3 is calculated by (fTTJl) . which according to Corollary |j] produces positive jj, j = 1, ■ • • ,p 
with probability going to one. If otherwise Assumption [5] is violated, a constrained optimization algorithm 
is employed to calculate 0, with the constraints that 7 j > 0, j = 1, • ■ • ,p and 9 > 0. For this paper, we 
have used the Matlab function fmincon.m to compute the constrained LS estimates. Consistent to our 
theorems, we see that the MRE’s for both 0 and a 2 converge to 0 as sample size increases. Especially, if the 
model really fits the data, which is the case for our simulation, the unconstrained LS estimate given in (fTTJl) 
is sufficient, without the need of a constrained optimization algorithm, with probability going to one. 

Next, we carry out more delicate investigations into the parameter estimation with a particular model 
randomly generated from configuration III. The exact parameter values are listed in the second column 
of Table [2] We simulate a random sample of size n = 300 from this model and estimate the coefficient 
matrix /3 using the algorithm described in the preceding paragraph. The procedure is repeated for 500 times 
independently, and the mean estimates and mean variances are reported in columns 3 and 4, respectively. 
It is seen that the mean estimates are very close to the corresponding true values. The empirical variances 
of these estimates for the 500 repetitions are displayed in column 5, which are satisfactorily close to the 
calculated variances in column 4. 
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Table 1: Evaluation of the LS estimation for simulated data based on 500 independent repetitions. 



n 

MRE (/3) 

MRE (a 2 ) 

U nconstrained 

Constrained 

Model I 

100 

0.4764 

0.0982 

496 

4 


200 

0.3806 

0.0796 

499 

1 


300 

0.3539 

0.0726 

500 

0 


400 

0.3386 

0.0695 

500 

0 

Model II 

100 

0.4468 

0.0985 

499 

1 


200 

0.3844 

0.086 

499 

1 


300 

0.3613 

0.0764 

500 

0 


400 

0.3331 

0.0728 

500 

0 

Model III 

100 

0.4581 

0.0832 

460 

40 


200 

0.3305 

0.0546 

473 

27 


300 

0.2666 

0.0463 

478 

22 


400 

0.2352 

0.0391 

481 

19 


Table 2: Parameter estimation for one particular model based on 500 independent repetitions. 


Parameter 

True Value 

Mean Estimate 

Estimated Variance 

Empirical Variance 

1 

1.4932 

1.4499 

2.0048 

1.8581 

ai 

1.6419 

1.6457 

0.0520 

0.0527 

Pi 

1.5542 

1.5527 

0.0521 

0.0524 

Ct 2 

-1.8902 

-1.9098 

0.0521 

0.0557 

P 2 

-3.2780 

-3.2585 

0.0521 

0.0558 

Ot 3 

-2.4036 

-2.3967 

0.0518 

0.0519 

P 3 

-1.8451 

-1.8528 

0.0518 

0.0508 

e 

1.7999 

1.8149 

3.8753 

3.5164 

7i 

1.2086 

1.2276 

0.1033 

0.1031 

72 

2.5633 

2.5347 

0.1035 

0.1112 

73 

2.5436 

2.5477 

0.1028 

0.0971 
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Figure 4: Plots of Y against X\. X 2 , and X 3 , respectively, of a simulated data from model III with sample 
size n = 100 . 


Finally, we compare our linear model to the M model by Blanco-Fernandez et al. (2011) and the con¬ 
strained center and range method (CCRM) by Lima Neto and De Carvalho (2010). We presented in Section 
12.31 that these two models are essentially reduced forms of our model. Here we give empirical evidence based 
on their predicting performances. We simulate 500 independent samples from Model I, II, and III, with 
training sample size n = 60,100, 200, 300, respectively. For each sample, we simulate another n /4 observa¬ 
tions as the validation set. We use the mean squared error (MSE) of the center, radius (half-range), and the 
interval as a whole, for the validation set as our measures of predicting performance. Specifically, they are 
defined as 


4 n / 4 2 

MSEC = > 

1 2=1 

4 n / 4 2 

MSER = -E(E-*r) , 

i=1 

MSEI = MSEC + MSER. 

The LS solution for the parameters of the M model is calculated according to the formulas given in Blanco- 
Fernandez et al. (2011). The CCRM is implemented using the R function ccrm in the iRegression package. 
The M model was only developed for the univariate case, so it is excluded in the multiple case (Model 
III). Numerical results for comparing the three methods are shown in Table [3] Just as we expected, the 
performance of our model is consistently significantly better than the other two models across different model 
configurations and sample sizes. Especially for Model III, the average MSEC of the CCRM is about 3 times 
bigger than that of our model, which results from the increased number of predictors. That is, the expanded 
flexibility of our model increases proportionally with the size of the model. The more predictors we include 
in the model, the more increased flexibility we have over the CCRM and the M model. 
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Table 3: Comparison of CCRM and our model for simulated data based on the average of 500 independent 
repetitions. 





M 



CCRM 


Our Linear Model 


n 

MSEC 

MSER 

MSEI 

MSEC 

MSER 

MSEI 

MSEC 

MSER 

MSEI 

Model I 

60 

6.3398 

4.6962 

11.0361 

6.2085 

4.8585 

11.067 

4.8214 

4.1395 

8.961 


100 

6.1048 

4.4821 

10.5869 

6.1318 

4.7239 

10.8557 

4.775 

3.9936 

8.7686 


200 

6.1018 

4.6074 

10.7092 

6.0296 

4.5987 

10.6283 

4.8072 

4.0166 

8.8238 


300 

5.991 

4.494 

10.4849 

5.9705 

4.7316 

10.7021 

4.6907 

3.9587 

8.6493 

Model II 

60 

6.5056 

4.4005 

10.9062 

6.4052 

4.8404 

11.2456 

4.9052 

4.0148 

8.92 


100 

6.178 

4.567 

10.745 

6.0754 

4.6265 

10.7019 

4.7378 

3.9721 

8.7098 


200 

6.1372 

4.6152 

10.7525 

6.0185 

4.6246 

10.6431 

4.6802 

3.9186 

8.5988 


300 

6.0026 

4.4992 

10.5018 

5.9019 

4.7009 

10.6029 

4.5837 

3.9066 

8.4903 

Model III 

60 

_ 

_ 

_ 

14.1623 

5.1865 

19.3488 

5.1949 

4.9172 

10.1122 


100 

- 

- 

- 

13.2387 

4.8486 

18.0873 

5.0919 

4.8285 

9.9205 


200 

- 

- 

- 

13.3159 

4.7531 

18.069 

4.7472 

4.6946 

9.4418 


300 

- 

- 

- 

13.1125 

4.825 

17.9375 

4.6887 

4.634 

9.3227 


6 A real data application 

In this section, we apply our linear model to analyze an interval-valued climate data provided by the National 
Oceanic and Atmospheric Administration (NO A A) and publicly available. The data contains three variables. 
The outcome variable, which we denote by Y, is the average [minimum, maximum] temperature in July 
based on weather data collected from 1981 to 2010 by the NOAA National Climatic Data Center of the 
United States. The first predictor Xi is the corresponding average temperature range in April. The second 
predictor X 2 is the [morning, afternoon] relative humidity in July averaged for the years 1961 to 1990. 
Relative humidity measures the actual amount of moisture in the air as a percentage of the maximum 
amount of moisture the air can hold, and it corresponds negatively to the temperature. All the three 
interval-valued variables are observed for 51 large US cities. By this analysis, we aim to model the summer 
(July) temperature by an affine function of the spring (April) temperature and the July relative humidity. 
We randomly split the full data into a training set of 40 observations and a validation set of 11 observations. 
Figure [5] plots Y against Xi and X 2: respectively, for the training set. It is checked that 1) the data matrix 
X has full rank; 2) the sample correlation of X R and X R has a p-value greater than 0.05; 3) the sample 
correlations of X R and Y R , * = 1,2, are both positive. So all the theoretical results we developed in Section[3] 
and 0] should apply. This means that with large probability we can get the LS estimates f3 simply by formula 
m, without any constrained optimization algorithm. The LS estimates of the parameters are found to be 

r) = 45.7740, 6 = 1.5419, 

ai = 0.2069, /3i = 0.3392, 71 = 0.8839, 

a 2 = -0.0593, = -0.0948, 72 = 0.0234, 

and the estimated variance of residual according to Corollary [4] is 

a 2 = 16.0454. 


It follows that the fitted linear model is 

Y l = 45.7740 + 0.2069Xf + 0.3392X 1 C/ - 0.0593X 2 L - 0.0948Xf + e L , 
Y u = 47.3159-0.6269X^ + 1.1731X^-0.0826X^-0.0714X7 + ^, 
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where e L ,e u are i.i.d. random variables with mean 0 and variance 16.0454. For comparison purposes, we 
also fit a CCRM model to the data, which turns out to be 

Y c = 57.2428 + 0.5768Xf - 0.1921Xf + e c , 

Y r = 1.5419 + 0.8339Xf + 0.0234A 2 fi + e^, 

where the random errors e c ,e R have both means 0, and variances 19.7509 and 12.9017, respectively. The 
predicting performances on the validation set of both models are reported in table [|] Consistent with our 
theoretical analysis in Section 12.31 and our simulation study in Section [5j our linear model has much more 
flexibility than the existing reduced models such as CCRM, which leads to the much improved predicting 
performance even for a small data set as presented here. 
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Figure 5: Left: plot of July temperature versus April temperature. Right: plot of July temperature versus 
July relative humidity. 


Table 4: Predicting performance comparison of CCRM and our model for the real data. 



MSEC 

MSER 

MSEI 

Our Linear Model 

8.7484 

4.1004 

12.8488 

CCRM 

10.8631 

4.1004 

14.9634 


7 Conclusion 

We have introduced a linear model for interval-valued data based on the affine operators in the cone C = 
{(x, y) £ R 2 |x < y}. The new model is shown both theoretically and empirically to have improved flexibility 
over the existing models in the literature. We present the general model for multiple predictors in matrix 
form, from which the LS estimators of the model parameters are immediately derived with a series of nice 
properties from the classical theory of linear models. Some parameters have positive constraints, which we 
show are closely related to the intrinsic structure of the model. Therefore, it is not recommended to blindly 
force these parameters to be positive with a constrained optimization algorithm. Instead, it is better to let 
the data speak for itself by the unconstrained LS estimates and decide later whether to employ a constrained 
optimization algorithm or resort to a different model, according to the guideline we have provided in the 
paper. 
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A Proofs 

A.l Proof of Proposition [Tj 

Proof. From (l22l) . since e R and X R are uncorrelated, 

X R , TA-f + 6 + ^ I = E ^ Cov (E E) > k = 1 , • • • ,p, 

3=1 / i =1 

from which the first result follows. Taking expectations on both sides of (1^1) yields the second result. □ 


Cov (Xf, Y r ) = Cov 


A.2 Proof of Theorem [2] 


Proof. Differentiating 


(Y R -Y R ) 2 + (Yy-YPy 


we obtain the system of equations 


Y, u - Y/ J ) =0 


n 

n 

2= 1 
n 

E(EE l ) = o, 

2=1 

n 

EE(E-E) = 0 , 

2=1 

n 

Y, X k,i(Yf-Yf) = 0 , 

2=1 

E E [(E - E) + (E - E) 

2=1 


with respect to (a fc ,/3 fe , 7 *,, 77 , 0, fc = 1, ■ • ■ ,p}, 

(27) 

(28) 

(29) 

(30) 

= 0, (31) 


Equations E71) - (E51) yield 


'E Y ‘ R =Y,n['E x u 

3 =1 Vi=l 


■ n6. 


2 — 1 


Meanwhile, equations (EH1) - (15U1) yield 


E 73 (E EE j + <? E E = E EE k = i,---, P . 

3 =1 


, 2=1 


2=1 


2=1 


Plugging ([3^1) into we obtain 


E'tt 

J =1 


^EEE (^EeI [l± x i 


2=1 


2=1 


2=1 


= -^EEE-EEE) ;Ee- * = 


(32) 


(33) 


(34) 


Writing equations (J34J) in matrix form yields 

(El¬ 


is obtained by plugging jj, j = 1 , • • • . p in equation 

□ 
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A.3 Proof of Corollary [5] 

Proof. From Theorem^ 7 = Si/Spi < 0, where Si and Si .1 are the sample covariance of X R and Y R , and 
the sample variance of X R , respectively. Namely, 

1 n 

Si = - J2 ( Y ; R - Y*) ( xR - X*) < 0, (35) 

i —1 

1 n 2 

= >0 - ( 36 ) 

i=1 


Let 



be the joint constrained LS estimates of [ 7 , 9] such that 7 > 0. Then, 


9 = Y r — 7 X R . 


It follows that 

Y R = jX R + § = jX R +(W-iX^j =W + i(x R -X^, i = 1, - ■ ■ ,n. 

Then the sum of squared errors for the prediction of Y R based on the constrained LS estimates is calculated 
to be 


n „ 

E ( y * - f ’ R ) 


n 

-i(x R -Y*) 


n 2 n 2 

= e { Y ’ R - w ) +7 2 E( x ^-^) 

1— 1 2—1 

n 2 

= E(^-^) +7 2 ^i,i- 27 nSi. 

2 — 1 

Therefore, in view of (f3^ - ([36j) . 


2 7E(7 fl - yiJ ) ( x ?- xR ) 
2—1 


n 2 n 2 

2=1 2 — 1 

and ‘ ” holds if and only if 7 = 0. This completes the proof. 


(37) 

□ 


A.4 Proof of Corollary [6] 

Proof. Under Assumption [T] 

E x « -> diag {Var (X R ) , ■ ■ • , Var (X R )} a.s., 

and 


S x h. y « -t [Cov(Xf,U fl ) ,Co v(X r ,Y r )] a.s.. 


It follows that 


T -»• 


Var (X R ) 


Var (X R ) 


Cov (X R , Y R )' ’ Cov (X R , Y R ) 

Equations (1381) and (15U1) together imply 

IjSj ->• Var (X R ) a.s., j = !,■■■ ,p. 


a.s.. 


and therefore, 


Hence, if Sj > °> 


P ( A ijSj > 0) —► 1, as n —> oo, j = 1, • • • ,p. 
P ( 7 j > 0) —>■ 1, as n —> oo. 


(38) 

(39) 


□ 
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A.5 Proof of Theorem [3] 

Proof. Notice that 


E 


(y r -y r )y r ] = e{e[(y r -y r )y r \X r ]} 


= E 
= 0 . 


[y?E (y r - Y R \X R ) 


Therefore, 


e(y*) 2 = e(y*-y*) + e(y r ) . 

This together with the fact that E (y^ = E (lA) yields 


E (y, R - t R ) 2 = E (Y r ) 2 - E (y r ) 2 = Var (Yf) - Var (f> R ) . 


Separately, 


E (y r - Y R ) = E (ef ) 2 = E (ef - ef) 5 
By Markov’s inequality, we have 


= 2a 2 . 


P 


(y r < o) < p (| y, 


R - y r \ >y h ) < 


E 


(y r -Y r )' 

m 2 


together with (l40]l and (|4lll proves the desired result. 


(40) 

(41) 


(42) 

□ 
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